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This paper explores a relation between various approximation problems (arising from fitting linear models 
to data) and corresponding statistical measures (norm statistics). It is established that for anv optimal solution to 
an approximation problem defined with respect to a norm, the resulting residuals have zero as their norm 
statistic. This result holds whenever the underlying design matrix has a column of ones. An extension to the case 
of arbitrary design matrices is also considered. 
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1 . Motivation 

In a paper 1 discussing alternative criteria to least squares for the fitting of linear models to data, Appa 

and Smith [l] 2 derive certain properties of solutions to L\ approximation problems (i.e., curve-fitting 

problems in which the sum of absolute deviations is minimized). In particular, Properly 2 of [1 ] characterizes 

m 

the sign pattern of the residuals e{ = ji — b — ^ bjXij corresponding to an optimal solution (6 , . . . , b m ) 

3=1 

to an L x approximation problem with independent variables x u . . . , x m and dependent variable y. The 
result of Appa and Smith states that \N x — N 2 \ — m + 1, where N X and N 2 denote, respectively, the number 
of positive residuals and the number of negative residuals corresponding to any optimal Li solution. 

This observation admits of a slight generalization [4]: namely, pi x — I\ 2 \ ^ Z, where Z indicates the 
number of zero-valued residuals in the given optimal solution. (The assumption employed in [1J to eliminate 
degeneracy insures that Z < m + 1, and thus the result of Appa and Smith follows immediately from the 
above inequality. ) 

It is straightforward to show that |/V t — A^ 2 | < Z is equivalent to the statement that the residuals in an 
optimal L x solution have a median of zero. Recall that a median of some set of observations is any value that 
exceeds at most half the observed numbers, and is exceeded by at most half the observed numbers. From 
this definition it immediately follows that a median of the numbers u u . . . , u n (not necessarily distinct) is 
any value £ such that 

N t (t) + Z(fl > N t {£) (1) 

and 

/V 2 (£) + Z(f) => AMf), (2) 

where Aj(£) = card{i: u t > £}, /V 2 (f) = card{i; u\ < £}, and Z(£) = card{i; u t = £}. Hence, zero is a 
median of the residuals e u . . . , e n if and only if N x + Z > N 2 and A^ 2 + Z > N x . But the latter two 
inequalities are clearly equivalent to pi \ — A 2 | <Z. 

The point to be emphasized here is that the sign pattern result 3 p\ — N 2 \ ^ Z is equally a statement 
about zero being a median of certain residuals. Such a result brings to mind a related statement about the 
residuals for solutions to L 2 (least squares) approximation problems: namely, the mean of the residuals, 
derived from an optimal L 2 solution, is zero. Likewise for L <» approximation problems (in which the object is 
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1 This paper is also commented upon in the short communication [3] of Gentle et al. 
" Figures in brackets indicate the literature references at the end of this paper. 

:1 It is also easy to show that when n is odd, a slightly stronger result obtains: |A'i — N 2 \ —Z — \. Indeed, since A^ + /V 2 + Z — n — odd. the parity (even, odd) «>! \, 
\ '.,. and thus /V, - A 2 , is the same as the parity ofZ - 1. Accordingly, |A, - A 2 | ^Z is equivalent to |/Vi — A^j S2 - 1, when n is odd. 
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to minimize the maximum absolute deviation), it is known that the midrange [6] of the residuals in an 
optimal Loo solution is zero. One wonders whether these facts might not be separate manifestations of a 
general relationship between approximation problems and corresponding statistical measures. Such a general 
relationship indeed exists and will be explored in the subsequent sections. The proof of this relationship is 
extremely simple, simpler than the proofs for the special L l and L 2 cases we have found in the literature. The 
results of this paper therefore provide both simplification and unification. 



2. Norm Approximation Problems 

Suppose that n sets of observations are available on a single dependent variable y and m ^ 
independent variables x u . . . , x m . Such observations can be arranged in a column vector y = (vj, . . . , y n ) T 
and an n X m matrix ,¥ = (^y), where j,-, Xn, . . . , xi m represent observations in the ith set. Then the L p 
approximation problem [2], 1 </> < °°, is that of finding values b , b x , . . . , fr m that minimize 



2 \ji ~ b o ~ 2 opc i5 \ p 



(3) 



n 




m 


X U) i \ji ' 
i=\ 


-b,- 


~ X bjXij\ p 

3=1 



over all b , b u . . . , b m . For the case p = 1, the problem is that of minimizing the sum of the absolute 
values of the deviations by choice of parameters b , b 1 , . . . , b m . When p = 2, the above formulation 
presents the familiar problem of curve-fitting by least squares. In the case p — °°, the objective function in 
(3) becomes max^ \yi — b — Xl^i bjXij\, and we have the linear Chebyshev approximation problem. Every 
such L p approximation problem can in fact be formulated [2] as a mathematical programming problem with a 
convex objective function and linear constraints. 

A problem more general than that described by the objective function (3) is the weighted L p 
approximation problem, where 1 ^ p < °°. Given nonnegative weights W\, * • • , w n , this problem concerns 
finding parameter values b , 6 ls . . . , b m to minimize 

m "J Up 

(4) 

The inclusion of weights in the above may reflect, for example, identical observations as well as differing 
degrees of confidence (or measures of importance) to be attached to the observed data points. 

An even more general approximation problem can be formulated in the present context with respect to 
any norm. A norm /V(x) is defined on vectors x and is assumed to have the following properties [5]: 

/V(x) > unless x = 0, 

/V(Ax) = X/V(x), for A>0, 

/V(x + y) </V(x) + N(y). 

Let b = (&i, . . . , 6 m ) T and form the residuals e = y — b 1 — X b, where 1 = (1, . . . , 1) T . Then the 
norm approximation problem is that of finding (6 , b) to minimize 

N(e) = N(y - 6 1 -Xb). (5) 

The objective function (3) is a special case of (5) with /V(e) = N(ei, . . . , e n ) — []£|Li M*] 1 '* while (4) is 
also a special case with /V(e) = []£ f =1 t0f|ej p j 1 ' p . 

It can readily be shown that A^(e) is a convex function of (6 , b), and thus the approximation problem 
described by (5) is well behaved: any local minimum to this problem is also guaranteed to be a global 
minimum. 
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3. Norm Statistics 

The discussion in section 1 indicated that certain statistics (namely, the median, mean and midrange) 
were useful in describing properties of certain L p approximation problems. Namely, the residuals of an 
optimal L x solution have a median of zero, the residuals of an L 2 solution have a mean of zero, and the 
residuals of an L x solution have a midrange of zero. Moreover, it is well known that these three statistics 
themselves solve appropriate one-dimensional L p approximation problems. 

For example, the median of a set of values U\, . . . , u n is a value v that minimizes ^f =1 pi — v\ over 
all possible v. That is, a median solves an L x approximation problem with one parameter. Similarly, the 
mean of u l9 . . . , u n minimizes ^f =1 |uj — v\ 2 , and thus also [ 2l=i \ u i ~ v\ 2 ] 112 . Accordingly, the mean 
solves a one-parameter L 2 problem. Finally, the midrange minimizes max* \ui — v\, an Loo approximation 
problem, again with one parameter. As suggested by the above examples, we define a p- Statistic of 
Ui, . . . , w n to be a value v that minimizes 



2 \ u i 



i=l 



IIP 



where 1 <p < o°. This definition follows that given by Rice and White [7], who refer to such a value as an 
"L p estimate." In similar fashion, a weighted p-statistic of ii\, . . . , u n is defined to be a value v that 

minimizes 



n "I HP 

2 w ( \ui - v\ p \ , 

1=1 J 



where the nonnegative weights w \ are given and 1 ^ p < °°. Such a concept generalizes, (or example, the 
idea of a weighted mean or a weighted median. 

Finally, let N be a norm as defined in section 2. Then a norm statistic, or an N-statistic, for u = 
(iii, . . . , u n ) T is defined to be a value v that minimizes /V(u — v 1). Clearly, the concept of an /V-statistic 
includes as special cases both /^-statistics and weighted /^-statistics. 

4. Norm Approximation Problems and N-Statistics 

This section contains the main result relating /V-statistics and norm approximation problems. 

Theorem: Let (b , b) be an optimal solution to the norm approximation problem (5), and let e = y — l) 1 
— X b. Then zero is an N -statistic for the residuals e. 

Proof: N(e - 0-1) = JV(e) 

= N(y -b l -Xb) 
< W(y - [S + v] 1 -lb) for all v 

= N{y -b l ~^b -vl) for all v 

= N(e — v 1 ) for all v. 

The third line above holds because (6 , b) minimizes (5). The resulting inequality N(e — 0*1) < N(e — v 
1), for all ?;, shows that minimizes N(e — v 1), and so is an /V-statistic for e. This completes the proof. 

Notice that in the proof above, we did not at all need the norm properties of /V. As a matter of fact, N 
could have been an arbitrary function; in this case, the theorem applies to a global solution (if it exists) to a 
very general approximation problem. 

5. Arbitrary Design Matrices 

A further generalization of the above theorem is possible for weighted L p approximation problems. The 
extension of interest allows an arbitrary "design matrix," where a column of l's is not necessarily imposed. 
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In such a problem, the object is to find b = (/; , • • • , b m ) such that 



2 m \ji ~ 2 bPii? 

1=1 j=0 



UP 

(6) 



is minimized. 



Extension: Let b be an optimal solution to (6), and let e = y — X b. Then zero is a weighted ^-statistic (1 
< p < oc)/or the values {ei/xi :xio =£ 0, i = 1, . . . , n} with weights Wi|xio| p . 

n n 

Proof: 2 w i\ e i ~ ®' x io\ p = 2 w i \ e i\ P 

i=\ i=l 

n m 

= 2 m\n - 2 M<j| p 

i=l j=0 

n m 

= 2 w i\ji ~ b 0*i0 - 2 bpc i3 \ P 
i=l j=l 

n m 

- S w *lr< ~ [^o + *>]*«> ~ 2 &i«d p 

1=1 j=i 

n 

i=l 

Thus, if we define T — {i: Xi =/= 0}, the above inequality gives 

X wje* - 0-x i0 \ p < ^ w^ - ^o| p , 



1 «WoP l^o ~ °r - 1 ^Fiol Is: ~ v \ • 

Upon taking the pth root (1 ^ p < °°) of both sides, we conclude that zero is a weighted p-statistic for 
{ejxft. x i0 =£ 0} with weights Wi[%j | P - 

Notice that in the proof above, the choice of the first column, corresponding to the x l0 's, is clearly 
arbitrary. Any column of the design matrix can be used with similar result. 
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